Week 01: LLM Review

Introduction to Large Language Models and AI in Data Analysis

Published

March 20, 2025

Week 01: LLM Review

Introduction to Large Language Models and their applications in data analysis


Learning Objectives

By the end of this session, students will:

  • Understand core concepts and architecture behind large language models (LLMs)
  • Learn how to incorporate AI into data analysis workflows effectively
  • Distinguish between “Cyborg” and “Centaur” approaches to AI collaboration
  • Experience the “jagged frontier” of LLM capabilities through hands-on practice
  • Critically assess capabilities and limitations of AI tools in academic contexts
  • Develop foundational prompt engineering skills for data analysis tasks

Preparation / Before Class

📚 No Required Reading

This is Week 1 - come ready to explore and discuss!

Reflection Preparation:

  • Think about your current experience with AI tools (if any)
  • Consider examples where you’ve encountered AI in your work/studies
  • Identify one data analysis task you find time-consuming or repetitive

Optional Background:

  • Ethan Mollick: “Co-Intelligence: Living and Working with AI” (Chapters 1-2)

Class Material

📊 Core LLM Concepts (60 min)

Slideshow: LLM Concepts and Applications

Key Topics Covered:

  • What are LLMs? Statistical models predicting next tokens from massive training data
  • The Transformer Revolution: How 2017’s “Attention is All You Need” changed everything
  • Context Windows: why this matters for data analysis
  • Training Process: Resources and human feedback
  • Collaborative Frameworks: How to integrate human and AI work

🔧 Hands-on: FT Graph Challenge (25 min)

The Challenge:

Recreate the Financial Times tariff/yield visualization as accurately as possible using AI assistance.

  • Note the sophisticated design: multiple data series, annotations, professional styling
  • Consider: What would you need to recreate this manually vs. with AI assistance?

Learning Approach:

  • Analyze the original - what data sources, what design elements?
  • Use AI to find data and generate initial code
  • Iterate and refine based on results

Discussion Questions

End of Week Reflection:

  1. Personal AI Experience: How have you already incorporated AI into your routine? Which model feels most natural to you?

  2. Error Management: How do you currently deal with AI hallucinations or imperfect answers? What strategies emerged during the FT graph exercise?

  3. The Jagged Frontier: What tasks do you expect AI to excel at? Where do you think it will struggle? Did the visualization exercise match your expectations?

Assignment

Assignment 1: Reproduce the FT Graph

Due: Before Week 2

Choose your approach:

  • Option A (Standard): Use AI to find data and recreate the visualization with same design principles
  • Option B (Advanced): Build an interactive dashboard that updates dynamically

Full Assignment Details

Background, Tools and Resources

🤖 Recommended for Data Analysis:

Platform Strengths Best For Cost
ChatGPT 4o/o1 Excellent coding, data analysis tools R/Python code, statistical methods $20/month
Claude 4 Sonnet Great reasoning, document analysis Research, writing, complex prompts $20/month
GitHub Copilot Integrated coding assistance Real-time code completion $10/month

Free Tier Capabilities:

  • Most course tasks achievable with free versions
  • Paid subscriptions recommended for intensive work – maybe try during this course
  • Start free, upgrade if you hit limits

Detailed Comparison:

AI Model Selection Guide

Academic Integrity and AI use

Course Philosophy:

  • AI as Assistant: Use AI to enhance your capabilities, not replace your thinking
  • Maintain Authority: You remain responsible for all outputs and interpretations
  • Verify Everything: Always validate AI suggestions, especially statistical claims
  • Document Usage: Keep track of how AI helped – useful for methodology sections

Red Lines:

  • Never submit unverified AI output as your own work
  • Always understand the analysis you’re presenting
  • Cite AI assistance appropriately (we’ll discuss standards as they evolve)

The Goal:

Become a more capable data analyst who can leverage AI tools effectively while maintaining scientific rigor.

Next Week:

Week 2 - Data Discovery and Documentation where we’ll use AI to understand and document complex datasets.

Some personal comments on AI and this class

  • I needed to rewrite, edit the slideshow every other month. Bloody hell, this course material is tricky. (While Gosset’s t-test has been around since 1908…)
  • It was an AI (Claude Sonnet 4.0) that suggest to include the last bit on Academic Integrity and it wrote many lines such as having red lines. Hahh.